Continuous speech recognition from ECoG
نویسندگان
چکیده
Continuous speech production is a highly complex process involving many parts of the human brain. To date, no fundamental representation that allows for decoding of continuous speech from neural signals has been presented. Here we show that techniques from automatic speech recognition can be applied to decode a textual representation of spoken words from neural signals. We model phones as the fundamental unit of the speech process in invasively measured brain activity (intracranial electrocorticographic (ECoG)) recordings. These phone models give insights into timings and locations of neural processes associated with the continuous production of speech and can be used in a speech recognizer to decode the neural data into their textual representations. When restricting the dictionary to small subsets, Word Error Rates as low as 25% can be achieved. As the brain activity data sets are fairly small, alternative approaches to Gaussian models are investigated by relying on robust, regularized discriminative models.
منابع مشابه
Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملBrain-to-text: decoding spoken phrases from phone representations in the brain
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode c...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کامل